Genome-wide survey of protein families and superfamilies
نویسنده
چکیده
The availability of whole genome information of several model organisms offers the possibility to perform analyses of protein families to decipher accurate protein function prediction. Association of gene products to pre-existing families is a popular approach to the prediction of function. However, despite the availability of huge amounts of valuable protein sequence information, nearly half of the gene products cannot be attributed structure or function by direct methods, obviating the intervention of bioinformatics inputs. In this review, we survey early whole-genome computational surveys of model organisms performed using bioinformatics algorithms such as fold recognition methods and ultrasensitive sequence search procedures. Fold recognition methods apply inverse folding approaches to recognize the fold of gene products. Heuristic sequence search algorithms, like PSI-BLAST, employ rapid but sensitive strategies to recognize relationships of gene products to pre-existing folds. We also discuss applications and analyses that are possible starting from such genome-wide surveys.
منابع مشابه
Towards a natural taxonomy of proteins and protein families
Computer analysis of complete prokaryotic genomes shows that microbial proteins are in general highly conserved — ~70% of them contain ancient conserved regions. This allows us to delineate families of orthologs across a wide phylogenetic range and, in many cases, predict protein functions with considerable precision. Sequence database searches using newly developed, sensitive algorithms result...
متن کاملThe Evolution and Diversity of DNA Transposons in the Genome of the Lizard Anolis carolinensis
DNA transposons have considerably affected the size and structure of eukaryotic genomes and have been an important source of evolutionary novelties. In vertebrates, DNA transposons are discontinuously distributed due to the frequent extinction and recolonization of these genomes by active elements. We performed a detailed analysis of the DNA transposons in the genome of the lizard Anolis caroli...
متن کاملGeMMA: functional subfamily classification within superfamilies of predicted protein structural domains
GeMMA (Genome Modelling and Model Annotation) is a new approach to automatic functional subfamily classification within families and superfamilies of protein sequences. A major advantage of GeMMA is its ability to subclassify very large and diverse superfamilies with tens of thousands of members, without the need for an initial multiple sequence alignment. Its performance is shown to be compara...
متن کاملThree monophyletic superfamilies account for the majority of the known glycosyltransferases.
Sixty-five families of glycosyltransferases (EC 2.4.x.y) have been recognized on the basis of high-sequence similarity to a founding member with experimentally demonstrated enzymatic activity. Although distant sequence relationships between some of these families have been reported, the natural history of glycosyltransferases is poorly understood. We used iterative searches of sequence database...
متن کاملAnnotation Error in Public Databases: Misannotation of Molecular Function in Enzyme Superfamilies
Due to the rapid release of new data from genome sequencing projects, the majority of protein sequences in public databases have not been experimentally characterized; rather, sequences are annotated using computational analysis. The level of misannotation and the types of misannotation in large public databases are currently unknown and have not been analyzed in depth. We have investigated the...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007